{% extends 'base.html' %} {% block head %}
At the end of this use case you will be able to:
Walkability is a planning concept where people can access amenities in their neighbourhoods easily on foot [1]. It is based on the idea that urban spaces should be liveable spaces for all types of users and transportation modes, and not merely designed for maximum vehicle throughput. It aims to reduce the need for cars to travel and provide citizens with healthy, safe, sustainable environments [2].
This use case is intended to provide advocates and council workers in the City of Melbourne with information that will create better policies regarding walkability.
Walkability is typically measured by identifying three things about an area [3]:
Is there something to walk to (to run errands, use parks or public transport)?
Is it easy to find direct routes when getting around?
How dense is the housing?
These factors will increase the probability of people choosing to walk for transportation.
This can be measured by the percentage of people with access to certain amenities, or by sampling an area and estimating distances to different features. Since calculating access to all different types of amenity would be difficult, a proxy for this is typically done using certain types of amenities. This can include regular public transportation (as measured by either 20 or 30 minute services), places to buy healthy food (groceries or supermarkets), and public open space larger than 1.5 hectares.
Walkability is concerned with street connectivity, as measured by the density of intersections. This is because grid-style layouts tend to provide easier access to amenities as people can walk there directly [4].
Higher population density, as measured by the population density per square kilometre, is associated with an increased need for different types of services and land uses [5] [6].
Sampling in one study [7] was done by sampling points every 30 metres along the pedestrian street network to estimate population and street intersection densities, as well as distance to several features. This study will use a grid technique to make a quick estimation for an area, which can be flexibly made smaller or larger depending on the amount of precision required.
Walkability is an important part of health, as recent research indicates neighbourhoods exceeding certain benchmarks are associated with better physical health [8].
Although Greater Melbourne falls short of walkability metrics, the City of Melbourne has a much higher walkability[9]. This use case seeks visualise the detailes about walkability in the City, and explore some extra metrics that could be used to improve walkability and thus citizen health.
Some key problems identified in a 2018 report produced for the City [10] will be addressed, with more to be mentioned in the suggestions for further research at the end of this document. On the importance of pedestrian movement in the city, this use case will include measures of delay when walking as the 2018 report notes that given the majority of internal trips in the City are on foot more consideration to pedestrian delay is warranted.
Residential dwellings (2021)
Pedestrian network
Footpath steepness
Road network
ABS Statistics about the population for Australia's capital cities and regions
Speed Zone Data
OpenStreetMap data
# Importing from the API
import requests
import os
import fiona
from io import BytesIO
from zipfile import ZipFile
# Working with the data
import numpy as np
import pandas as pd
import geopandas as gpd
import json
from sklearn.preprocessing import MinMaxScaler
# Working with shape files and geo data
import shapely
from shapely.geometry import Polygon, MultiPolygon, Point
# Visualisation
import matplotlib.pylab as plt
import folium
from folium import plugins
import seaborn as sns
import branca.colormap as cm
%matplotlib inline
This is used to trim geographical data sets to the correct area, see the green roof use case for details on this data source.
LGA_shape = gpd.read_file("./interactive_dependencies/greenroof_usecase/urban_heat/Local Government Area Solid.shp")
# Select only the relevant columns, and convert to the coordinate system that will be used for this use case
LGA_shape_gdf = LGA_shape.filter(['LGA_NAME', 'geometry']).to_crs('epsg:4326')
dwelling_url = "https://data.melbourne.vic.gov.au/explore/dataset/residential-dwellings/download/?format=geojson&refine.census_year=2021&timezone=Australia/Sydney&lang=en"
dwelling_response = requests.get(url=dwelling_url)
def to_geopandas(response:requests.models.Response):
"""
Builds a GeoDataFrame of the reponse content features.
Parameters:
response (requests.models.Response): HTTP Response object from the API
Returns:
gdf (GeoDataFrame): A GeoDataFrame
"""
b = bytes(response.content)
with fiona.BytesCollection(b) as f:
crs = f.crs['init']
gdf = gpd.GeoDataFrame.from_features(f, crs=crs)
return gdf
residential_gdf = to_geopandas(dwelling_response)
residential_gdf.head()
| geometry | longitude | property_id | building_address | clue_small_area | block_id | base_property_id | census_year | dwelling_number | latitude | dwelling_type | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | POINT (144.95651 -37.82098) | 144.95651 | 611394 | 545-557 Flinders Street MELBOURNE VIC 3000 | Melbourne (CBD) | 1 | 611394 | 2021 | 196 | -37.82098 | Residential Apartments |
| 1 | POINT (144.95649 -37.81988) | 144.95649 | 103957 | 517-537 Flinders Lane MELBOURNE VIC 3000 | Melbourne (CBD) | 11 | 103957 | 2021 | 26 | -37.81988 | Residential Apartments |
| 2 | POINT (144.95577 -37.82061) | 144.95577 | 103986 | 556-560 Flinders Street MELBOURNE VIC 3000 | Melbourne (CBD) | 11 | 103986 | 2021 | 3 | -37.82061 | Residential Apartments |
| 3 | POINT (144.95615 -37.82050) | 144.95615 | 103988 | 546-548 Flinders Street MELBOURNE VIC 3000 | Melbourne (CBD) | 11 | 103988 | 2021 | 111 | -37.82050 | Residential Apartments |
| 4 | POINT (144.95827 -37.81973) | 144.95827 | 103997 | 472-482 Flinders Street MELBOURNE VIC 3000 | Melbourne (CBD) | 12 | 103997 | 2021 | 3 | -37.81973 | Residential Apartments |
This is more accurate than the dwelling count, since the dwellings do not actually get inspected and thus could be unoccupied. There is also no information about how many people live in each dwelling. However, the ABS data is at Statistical Area 2, which roughly corresponds to the size of a suburb.
population_raw = gpd.read_file("./interactive_dependencies/walkability/SA2 ERP GeoPackage 2021 (ASGS2021).gpkg")
# only need the latest population information form the 2021 census
population_gdf = population_raw.filter(["SA4_code_2021", "SA3_code_2021", "SA2_name_2021", "ERP_2021", "Area_km2", "Pop_density_2021_people_per_km2", "geometry"])
# trim by area code
population_gdf = population_gdf[population_gdf['SA4_code_2021'] == 206]
Viewing the data we can see some are coming from outside the LGA, as seen in Statistica Areas 2 and 3 names (e.g. 'Brunswick').
If we filtered more finely using the SA3 code 20604 'Melbourne city', that will clip out some sections from the LGA. We need to visually confirm using the map what should be kept.
population_gdf.head()
| SA4_code_2021 | SA3_code_2021 | SA2_name_2021 | ERP_2021 | Area_km2 | Pop_density_2021_people_per_km2 | geometry | |
|---|---|---|---|---|---|---|---|
| 753 | 206 | 20601 | Brunswick East | 12963 | 2.1682 | 5978.691895 | POLYGON ((144.97307 -37.76388, 144.97340 -37.7... |
| 754 | 206 | 20601 | Brunswick West | 14499 | 3.1795 | 4560.150879 | POLYGON ((144.93407 -37.75969, 144.93405 -37.7... |
| 755 | 206 | 20601 | Pascoe Vale South | 10461 | 2.9887 | 3500.184082 | POLYGON ((144.93264 -37.74226, 144.93251 -37.7... |
| 756 | 206 | 20601 | Brunswick - North | 13091 | 2.4091 | 5433.979492 | POLYGON ((144.95031 -37.75810, 144.95044 -37.7... |
| 757 | 206 | 20601 | Brunswick - South | 13215 | 2.7334 | 4834.638184 | POLYGON ((144.94985 -37.77016, 144.94929 -37.7... |
We can overlay the population data with the LGA boundary on a map, and manually compare with the dwelling density records.
Let's create a map to visualise and confirm the data
# Creating a map
pop_map = folium.Map(location=[-37.8155, 144.95], tiles='cartodbpositron', min_zoom=11, zoom_start=13)
# Mapping function
def map_df(dataFrame, dfMap:folium.Map, columns:list=None, labels:list=None, description:str='Untitled',
styleFunction=None, highlightFunction=None, colorMap=None, overlay=True, show=True):
"""
Builds a layer for a data frame and adds it to a Folium map.
Parameters:
dataFrame (DataFrame or GeoDataFrame, required): The dataframe to be mapped.
dfMap (folium.Map, required): The Folium map to add the feature to.
columns (list): List of the columns to be included in the map, with the most important column at index 0, 'geometry' must be last.
labels(list): Labels/aliases to use in place of provided column names for the tooltip (don't include one for 'geometry').
description (str): A short description visible in map layers,
styleFunction (dict): The styling for the fill and outline colours.
highlightFunction (dict): The styling for the highlight.
overlay (bool): Sets as an overlay (Radio Button).
show (bool): Sets if layer shows by default when map is displayed
"""
# Add to its own feature group so it can be turned on and off.
featureGroup = folium.FeatureGroup(name=description, overlay=overlay, show=show).add_to(dfMap)
# Default style and highlight functions if none are provided:
if styleFunction is None:
styleFunction = {
'color': 'black',
'fillOpacity': 0.2,
'opacity': .3,
'weight': 1
}
if highlightFunction is None:
highlightFunction = {
'fillOpacity': 0.7,
'opacity': .5,
'color': "blue"
}
# If the columns are specified:
if columns is not None:
# Transform geo data into geoJson.
geoJSON = dataFrame[columns].copy().to_json()
# Add Layer Control name by adding the dataset name to json properties.
geoJSON = json.loads(geoJSON)
for entry in geoJSON['features']:
entry['properties']['dataset'] = description
# Add 'dataset' to the columns & labels lists for tooltip use.
columns.insert(0, 'dataset')
labels.insert(0, 'dataset')
# Uses the colour map if one is specified
if colorMap is None:
feature = folium.GeoJson(
geoJSON,
style_function=lambda feature: styleFunction,
# Controls behaviour when the user hovers the mouse over the feature
highlight_function=lambda feature: highlightFunction,
# Hover tool tip will include all information except the 'geometry' column
tooltip=folium.GeoJsonTooltip(
fields=columns[:-1],
aliases=labels)
).add_to(featureGroup) # Add the features to the group
else:
feature = folium.GeoJson(
geoJSON,
style_function=lambda feature: {
'fillColor': colorMap(feature['properties'][columns[1]]),
'fillOpacity': styleFunction['fillOpacity'],
'opacity': styleFunction['opacity'],
'weight': styleFunction['weight']
},
highlight_function=lambda feature: highlightFunction,
tooltip=folium.GeoJsonTooltip(
fields=columns[:-1],
aliases=labels)
).add_to(featureGroup)
if columns is None:
folium.GeoJson(
dataFrame,
name=description,
style_function=lambda feature: styleFunction,
).add_to(featureGroup)
First, we should add the City of Melbourne LGA border to the map so it is the bottom layer.
map_df(LGA_shape_gdf,pop_map,description="CoM LGA border", styleFunction={
'fillOpacity': 0,
'color': 'black',
'opacity': 1,
'weight': 3
})
Next, add the population for each area from the ABS statistics.
map_df(population_gdf,
pop_map,
['ERP_2021','Pop_density_2021_people_per_km2', 'SA2_name_2021','Area_km2','geometry'],
['Population', 'Pop density', 'area name' , 'Area in km2'],
"Population per SA2",
{'fillOpacity': 0.5,
'opacity': 0, #ensure there is space to see the LGA border
'weight': 3},
# use the min and max value of the variable being color mapped
colorMap=cm.linear.YlGnBu_07.scale(population_gdf['ERP_2021'].min(), population_gdf['ERP_2021'].max()))
Finally, add markers indicating the dwelling locations. To do this we will create a list by iterating over each row and storing the coordinates, as well as the dwelling numbers. FastMarkerCluster is a much higher performing way to add markers to the map compared with using the shapely geometry in the dataframe.
# Create locations for each dwelling, as indicated by their location and the dwelling numbers per site
locations = []
for row in residential_gdf.iloc:
location = [(row.latitude, row.longitude)]*row['dwelling_number']
locations += location
dwelling_fg = folium.FeatureGroup(name="Dwelling count", overlay=True, show=True).add_to(pop_map)
marker_cluster = plugins.FastMarkerCluster(
data=locations,
# Customise the binning of marker size
icon_create_function = '''
function (cluster) {
var childCount = cluster.getChildCount();
var c = ' marker-cluster-';
if (childCount < 100) {
c += 'small';
}
else if (childCount < 2000) {
c += 'medium';
}
else {
c += 'large';
}
return new L.DivIcon({
html: '<div><span>' + childCount + '</span></div>',
className: 'marker-cluster' + c,
iconSize: new L.Point(40, 40)
});
}'''
).add_to(dwelling_fg)
Looking at the map below, we can confirm that in the population density data the darker areas (more populated) also correspond with more dwellings.
folium.LayerControl().add_to(pop_map) # Add the layer control to switch layers
pop_map
The Port Melbourne industrial area we lose by clipping to the LGA has no dwellings, and so we won't lose any data accuracy by clipping, no changes needed.
We can also safely exclude Carlton North from our counts, as there is only one dwelling recorded in that area located in Princes Park and no dwellings in the cemetary.
# drop all not in Melbourne City expect the Port Melb Industrial
population_LGA = gpd.overlay(population_gdf.to_crs(4326), LGA_shape_gdf, how='intersection')
population_trim = population_LGA[(population_LGA.SA3_code_2021 == 20604) |
(population_LGA.SA2_name_2021 == 'Port Melbourne Industrial') |
(population_LGA.SA2_name_2021 == 'Carlton North - Princes Hill')]
Next set Port Melb Industrial area to 0 since there are almost no dwellings there, possibly unoccupied. The population counts are likely coming from outside the City of Melbourne. The Carlton area will also be zeroed.
idx = population_trim.index[population_trim['SA2_name_2021'] == 'Port Melbourne Industrial'][0]
population_trim.at[idx,'ERP_2021'] = 0
population_trim.at[idx,'Pop_density_2021_people_per_km2'] = 0
idx = population_trim.index[population_trim['SA2_name_2021'] == 'Carlton North - Princes Hill'][0]
population_trim.at[idx,'ERP_2021'] = 0
population_trim.at[idx,'Pop_density_2021_people_per_km2'] = 0
Let's visualise all the trimmed population information
pop_map2 = folium.Map(location=[-37.8155, 144.95], tiles='cartodbpositron', min_zoom=11, zoom_start=13)
map_df(LGA_shape_gdf,pop_map2,description="CoM LGA border", styleFunction={
'fillOpacity': 0,
'color': 'black',
'opacity': 1,
'weight': 3
})
map_df(population_trim.round(4), # Rounding to make numbers easier to read
pop_map2,
['Pop_density_2021_people_per_km2','ERP_2021','SA2_name_2021','Area_km2','geometry'],
['Pop density', 'Population', 'area name' , 'Area in km2'],
"Population Density",
{'fillOpacity': 0.5,
'opacity': 0, #ensure there is space to see the LGA border
'weight': 3},
# use the min and max value of the variable being color mapped
colorMap=cm.linear.YlGnBu_07.scale(population_trim['Pop_density_2021_people_per_km2'].min(), population_trim['Pop_density_2021_people_per_km2'].max()))
folium.LayerControl().add_to(pop_map2)
pop_map2
We can see that the population density is highest in Melbourne CBD North and South Bank, to be expected as those areas have a lot of high rise buildings.
To make this data more visible in small areas, we will need to create a grid to summarise both the population density and dwelling density into.
# create a grid for geometry
def create_grid(n_boxes = 16):
"""
Generates a grid using the LGA geopandas DataFrame
Parameters:
n_boxes (int): defaults to 16, number of boxes for the x/y length of the grid
Returns:
gdf_grid (GeoDataFrame): the grid as a geopandas dataframe
"""
LGA = LGA_shape_gdf.loc[:, ["geometry"]]
a, b, c, d = LGA.total_bounds
gdf_grid = gpd.GeoDataFrame(
geometry=[
shapely.geometry.box(minx, miny, maxx, maxy)
for minx, maxx in zip(np.linspace(a, c, n_boxes), np.linspace(a, c, n_boxes)[1:])
for miny, maxy in zip(np.linspace(b, d, n_boxes), np.linspace(b, d, n_boxes)[1:])
],
crs="epsg:4326",
)
# remove grid boxes created outside actual geometry
gdf_grid = gdf_grid.sjoin(LGA).drop(columns="index_right")
return gdf_grid
While creating visuals for the the individual metrics, we will also create a summary grid to be used at the end for the final walkability score. This will be much more detailed, with areas of approximately 200 x 200m2.
# create grids
summary_grid = create_grid(50)
pop_density_grid = create_grid()
# add the residential data to the grid
dwellings = residential_gdf.loc[:, ["geometry", "dwelling_number"]].sjoin(pop_density_grid)
pops = population_trim.loc[:, ["geometry", 'Pop_density_2021_people_per_km2']].sjoin(pop_density_grid)
dwellings2 = residential_gdf.loc[:, ["geometry", "dwelling_number"]].sjoin(summary_grid)
pops2 = population_trim.loc[:, ["geometry", 'Pop_density_2021_people_per_km2']].sjoin(summary_grid)
Next we use .dissolve(), a geopandas method that can be used to aggregate information for a given area.
# Aggregating by mean
pop_density_grid = pop_density_grid.join(
pops.dissolve(by="index_right", aggfunc="mean").drop(columns="geometry")
)
# Since this is a count metric, this will be summed
pop_density_grid = pop_density_grid.join(
dwellings.dissolve(by="index_right", aggfunc="sum").drop(columns="geometry")
)
# Summary
summary_grid = summary_grid.join(
pops2.dissolve(by="index_right", aggfunc="mean").drop(columns="geometry")
)
summary_grid = summary_grid.join(
dwellings2.dissolve(by="index_right", aggfunc="mean").drop(columns="geometry")
)
# Removing null rows
pop_density_grid_filt = pop_density_grid.dropna()
pop_density_grid_filt.head()
| geometry | Pop_density_2021_people_per_km2 | dwelling_number | |
|---|---|---|---|
| 19 | POLYGON ((144.90956 -37.83061, 144.90956 -37.8... | 0.000000 | 1.0 |
| 42 | POLYGON ((144.91585 -37.79049, 144.91585 -37.7... | 45.632702 | 3.0 |
| 55 | POLYGON ((144.92214 -37.80052, 144.92214 -37.7... | 2532.370850 | 449.0 |
| 56 | POLYGON ((144.92214 -37.79551, 144.92214 -37.7... | 2555.187201 | 878.0 |
| 57 | POLYGON ((144.92214 -37.79049, 144.92214 -37.7... | 2555.187201 | 91.0 |
Next, we add a column with the relevant data normalised. This will make it easier to create a combined metric.
dwell_scaler = MinMaxScaler()
pop_scaler = MinMaxScaler()
# Scaling the normalised column into a range of [0,1]
pop_density_grid_filt[['dwelling_no_norm']] = dwell_scaler.fit_transform(pop_density_grid_filt[['dwelling_number']])
pop_density_grid_filt[['pop_density_norm']] = pop_scaler.fit_transform(pop_density_grid_filt[['Pop_density_2021_people_per_km2']])
Lastly, we need to combine the two metrics to create a single point of data to visualise residential density in the City of Melbourne, and do the same thing again with the summary_grid.
pop_density_grid_filt['pop_density'] = (pop_density_grid_filt['pop_density_norm'] + pop_density_grid_filt['dwelling_no_norm'])/2
summary_grid[['dwelling_no_norm']] = dwell_scaler.fit_transform(summary_grid[['dwelling_number']])
summary_grid[['pop_density_norm']] = pop_scaler.fit_transform(summary_grid[['Pop_density_2021_people_per_km2']])
summary_grid['pop_density'] = (summary_grid['pop_density_norm'] + summary_grid['dwelling_no_norm'])/2
# Rounding
pop_density_grid_filt = pop_density_grid_filt.round(4)
# Create colourmap matching scale of data
res_cmap = cm.linear.YlGnBu_07.scale(pop_density_grid_filt['pop_density'].min(), pop_density_grid_filt['pop_density'].max())
# Visualising
pop_density_grid_filt.explore('pop_density', cmap=res_cmap, location=[-37.8155, 144.95], tiles="cartodbpositron", zoom_start=13)
Intersection density is often used as a proxy for direct paths - more intersections typically means it is more likely there will be a direct route to a pedestrian's destination, making walking more appealing.
Although there is a road dataset available on the open data portal, it does not label all intersections, only the major road intersections. It also does not account for light timings.
Instead we can use the cost column in the pedestrian network dataset.
# Get the pedestrian network data
ped_url = 'https://data.melbourne.vic.gov.au/api/datasets/1.0/pedestrian-network/alternative_exports/pedestrian_network_zip/'
ped_response = requests.get(url=ped_url)
# Check the names of the files to get the one we want
f = ZipFile(BytesIO(ped_response.content))
print(f.namelist())
['Property_centroid.json', 'Pedestrian_network.json']
ped = f.extract('Pedestrian_network.json')
with open('Pedestrian_network.json') as file:
pedestrian_gdf = gpd.read_file(file)
We can examine the metadata document to learn which columns are relevant.
The Otime and Ctime are opening and closing time, which are not actually accurate to some of the private pathways that go through places like shopping arcades. For this use case, we will ignore time of day.
COST refers to a suggested cost for modelling situations based on the type of object. This is useful as it includes some data about crossing time, although it is merely a classification into either short (15 secs) or long (30 secs) rather than a precise measurement.
The centroid connector is used for path modelling, applying a significant cost (500 minutes) to avoid modelling software trying to cut through buildings when pedestrians would typically use the footpath. We can safely filter those out for this use case. Similarly entrance conextor is a type of entrance to a building and unnecessary.
# Finding and fixing spelling errors
column_list = pedestrian_gdf['DESCRIPTION'].unique(); column_list
array(['Pestrian Footpath', 'Pedestrian Long wait Croosing',
'Short wait crosing', 'zebra', 'Tram crossing', 'Arcade', 'Lane',
'Entrance conextor', 'Centroid Connector'], dtype=object)
# Filter unnecessary columns
pedestrian_gdf = pedestrian_gdf.filter(['COST', 'Shape_Length', 'DESCRIPTION', 'TRAFFIC','geometry'])
# Create a dictionary of spelling mistakes to fix
spelling_fix = {
'Pestrian Footpath': 'Pedestrian Footpath',
'Pedestrian Long wait Croosing': 'Pedestrian Long Wait Crossing',
'Short wait crosing': 'Short Wait Crossing',
'zebra': 'Zebra',
'Tram crossing': 'Tram Crossing'}
pedestrian_gdf = pedestrian_gdf.replace(spelling_fix)
# Filtering
pedestrian_gdf = pedestrian_gdf[(pedestrian_gdf.DESCRIPTION != 'Centroid Connector') & (pedestrian_gdf.DESCRIPTION != 'Entrance conextor')]
# Viewing cleaned dataset
pedestrian_gdf.head()
| COST | Shape_Length | DESCRIPTION | TRAFFIC | geometry | |
|---|---|---|---|---|---|
| 0 | 1.856134 | 123.742235 | Pedestrian Footpath | High Traffic | LINESTRING (144.98254 -37.84522, 144.98392 -37... |
| 1 | 0.031922 | 2.128135 | Pedestrian Footpath | High Traffic | LINESTRING (144.98042 -37.84496, 144.98041 -37... |
| 2 | 3.098050 | 206.536685 | Pedestrian Footpath | Low Traffic | LINESTRING (144.98041 -37.84494, 144.97973 -37... |
| 3 | 1.408645 | 93.909699 | Pedestrian Footpath | Low Traffic | LINESTRING (144.98475 -37.84455, 144.98493 -37... |
| 4 | 0.515907 | 34.393783 | Pedestrian Footpath | Low Traffic | LINESTRING (144.98531 -37.84377, 144.98493 -37... |
Below we can already see some potential barriers to walking - costs for waiting at an intersection can often exceed the walking cost for an entire block to that intersection.
ped_map = folium.Map(location=[-37.8155, 144.95], tiles="cartodbpositron", zoom_start=14)
map_df(pedestrian_gdf,
ped_map,
['COST','DESCRIPTION', 'TRAFFIC','Shape_Length','geometry'],
['Cost', 'Description', 'Traffic' , 'Length'],
"Pedestrian network",
{'color': '#08af64', 'weight': 2, 'opacity':0.9})
ped_map